Upgrade vLLM to v0.11.2 #4368

wangxiyuan · 2025-11-24T02:14:58Z

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.11.0
vLLM main: vllm-project/vllm@2918c1b

github-actions · 2025-11-24T02:18:01Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Signed-off-by: leo-pony <[email protected]>

…tructured outputs compatibility#26866 Signed-off-by: leo-pony <[email protected]>

Signed-off-by: 22dimensions <[email protected]>

Signed-off-by: leo-pony <[email protected]>

Signed-off-by: 22dimensions <[email protected]>

Signed-off-by: leo-pony <[email protected]>

Signed-off-by: 22dimensions <[email protected]>

Signed-off-by: leo-pony <[email protected]>

…s enabled #27126 Signed-off-by: leo-pony <[email protected]>

Signed-off-by: leo-pony <[email protected]>

Signed-off-by: 22dimensions <[email protected]>

Signed-off-by: leo-pony <[email protected]>

…l #27583 Signed-off-by: leo-pony <[email protected]>

Signed-off-by: leo-pony <[email protected]>

…ithCustomDipatch #25110 Signed-off-by: leo-pony <[email protected]>

Signed-off-by: leo-pony <[email protected]>

Signed-off-by: shen-shanshan <[email protected]>

…t_positions#28399 Signed-off-by: leo-pony <[email protected]>

…8733 Signed-off-by: leo-pony <[email protected]>

…ference#23691 Signed-off-by: leo-pony <[email protected]>

Signed-off-by: wangxiyuan <[email protected]>

github-actions · 2025-11-24T09:10:58Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

zhangxinyuehfad · 2025-11-24T09:23:09Z

@leo-pony
Multi-Node-Ray test failed:
log:

(EngineCore_DP0 pid=300679) (RayWorkerWrapper pid=300872) INFO 11-24 08:50:32 [__init__.py:106] Registered model loader `<class 'vllm_ascend.model_loader.netloader.netloader.ModelNetLoaderElastic'>` with load format `netloader`
(EngineCore_DP0 pid=300679) (RayWorkerWrapper pid=300872) WARNING 11-24 08:50:33 [worker_base.py:301] Missing `shared_worker_lock` argument from executor. This argument is needed for mm_processor_cache_type='shm'.
(EngineCore_DP0 pid=300679) (RayWorkerWrapper pid=300872) INFO 11-24 08:50:33 [utils.py:973] FLASHCOMM2 not enable.
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842] EngineCore failed to start.
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842] Traceback (most recent call last):
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 833, in run_engine_core
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]     engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 606, in __init__
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]     super().__init__(
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 102, in __init__
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]     self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]   File "/vllm-workspace/vllm/vllm/v1/executor/abstract.py", line 101, in __init__
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]     self._init_executor()
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]   File "/vllm-workspace/vllm/vllm/v1/executor/ray_executor.py", line 97, in _init_executor
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]     self._init_workers_ray(placement_group)
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]   File "/vllm-workspace/vllm/vllm/v1/executor/ray_executor.py", line 370, in _init_workers_ray
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]     self.collective_rpc("init_device")
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]   File "/vllm-workspace/vllm/vllm/v1/executor/ray_executor.py", line 493, in collective_rpc
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]     return ray.get(ray_worker_outputs, timeout=timeout)
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]     return fn(*args, **kwargs)
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]            ^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/ray/_private/client_mode_hook.py", line 104, in wrapper
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]     return func(*args, **kwargs)
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/ray/_private/worker.py", line 2858, in get
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]     values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/ray/_private/worker.py", line 958, in get_objects
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]     raise value.as_instanceof_cause()
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842] ray.exceptions.RayTaskError(AssertionError): ray::RayWorkerWrapper.execute_method() (pid=300878, ip=172.22.0.188, actor_id=ccad69f02f06cafa8981145201000000, repr=<vllm.v1.executor.ray_utils.RayWorkerWrapper object at 0xffcfbc328810>)
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]   File "/vllm-workspace/vllm/vllm/v1/worker/worker_base.py", line 343, in execute_method
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]     raise e
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]   File "/vllm-workspace/vllm/vllm/v1/worker/worker_base.py", line 332, in execute_method
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]     return run_method(self, method, args, kwargs)
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]   File "/vllm-workspace/vllm/vllm/v1/serial_utils.py", line 479, in run_method
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]     return func(*args, **kwargs)
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]   File "/vllm-workspace/vllm/vllm/v1/worker/worker_base.py", line 324, in init_device
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]     self.worker.init_device()  # type: ignore
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]     ^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]   File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 236, in init_device
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]     self.device = self._init_device()
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]                   ^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]   File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 220, in _init_device
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]     assert self.parallel_config.local_world_size <= visible_device_count, (
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=300679) ERROR 11-24 08:50:34 [core.py:842] AssertionError: local_world_size (32) must be less than or equal to the number of visible devices (16).

zhangxinyuehfad · 2025-11-24T09:29:46Z

@wangxiyuan @MengqingCao
Multi-Node-DP test failed:
log:

INFO 11-24 09:11:25 [__init__.py:217] Platform plugin ascend is activated
Error.  nthreads cannot be larger than environment variable "NUMEXPR_MAX_THREADS" (64)Error.  nthreads cannot be larger than environment variable "NUMEXPR_MAX_THREADS" (64)INFO 11-24 09:11:26 [importing.py:68] Triton not installed or not compatible; certain GPU-related functions will not be available.
INFO 11-24 09:11:26 [importing.py:68] Triton not installed or not compatible; certain GPU-related functions will not be available.
Error.  nthreads cannot be larger than environment variable "NUMEXPR_MAX_THREADS" (64)Error.  nthreads cannot be larger than environment variable "NUMEXPR_MAX_THREADS" (64)INFO 11-24 09:11:28 [importing.py:68] Triton not installed or not compatible; certain GPU-related functions will not be available.
INFO 11-24 09:11:28 [importing.py:68] Triton not installed or not compatible; certain GPU-related functions will not be available.
Error.  nthreads cannot be larger than environment variable "NUMEXPR_MAX_THREADS" (64)Error.  nthreads cannot be larger than environment variable "NUMEXPR_MAX_THREADS" (64)INFO 11-24 09:11:29 [importing.py:68] Triton not installed or not compatible; certain GPU-related functions will not be available.
INFO 11-24 09:11:29 [importing.py:68] Triton not installed or not compatible; certain GPU-related functions will not be available.
(Worker_DP0_TP1_EP1 pid=322215) INFO 11-24 09:11:30 [model_runner_v1.py:3746] Loading model weights took 29.0584 GB
(Worker_DP0_TP0_EP0 pid=322214) INFO 11-24 09:11:30 [model_runner_v1.py:3746] Loading model weights took 29.0584 GB
(Worker_DP0_TP6_EP6 pid=322220) INFO 11-24 09:11:31 [model_runner_v1.py:3746] Loading model weights took 29.0584 GB
(Worker_DP0_TP7_EP7 pid=322221) INFO 11-24 09:11:31 [model_runner_v1.py:3746] Loading model weights took 29.0584 GB
(Worker_DP0_TP4_EP4 pid=322218) INFO 11-24 09:11:33 [model_runner_v1.py:3746] Loading model weights took 29.0584 GB
(Worker_DP0_TP5_EP5 pid=322219) INFO 11-24 09:11:33 [model_runner_v1.py:3746] Loading model weights took 29.0584 GB
(Worker_DP0_TP3_EP3 pid=322217) INFO 11-24 09:11:34 [model_runner_v1.py:3746] Loading model weights took 29.0584 GB
(Worker_DP0_TP2_EP2 pid=322216) INFO 11-24 09:11:35 [model_runner_v1.py:3746] Loading model weights took 29.0584 GB
(EngineCore_DP0 pid=321405) ERROR 11-24 09:11:35 [core.py:842] EngineCore failed to start.
(EngineCore_DP0 pid=321405) ERROR 11-24 09:11:35 [core.py:842] Traceback (most recent call last):
(EngineCore_DP0 pid=321405) ERROR 11-24 09:11:35 [core.py:842]   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 829, in run_engine_core
(EngineCore_DP0 pid=321405) ERROR 11-24 09:11:35 [core.py:842]     engine_core = DPEngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=321405) ERROR 11-24 09:11:35 [core.py:842]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=321405) ERROR 11-24 09:11:35 [core.py:842]   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 1124, in __init__
(EngineCore_DP0 pid=321405) ERROR 11-24 09:11:35 [core.py:842]     super().__init__(
(EngineCore_DP0 pid=321405) ERROR 11-24 09:11:35 [core.py:842]   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 606, in __init__
(EngineCore_DP0 pid=321405) ERROR 11-24 09:11:35 [core.py:842]     super().__init__(
(EngineCore_DP0 pid=321405) ERROR 11-24 09:11:35 [core.py:842]   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 109, in __init__
(EngineCore_DP0 pid=321405) ERROR 11-24 09:11:35 [core.py:842]     num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches(
(EngineCore_DP0 pid=321405) ERROR 11-24 09:11:35 [core.py:842]                                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=321405) ERROR 11-24 09:11:35 [core.py:842]   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 215, in _initialize_kv_caches
(EngineCore_DP0 pid=321405) ERROR 11-24 09:11:35 [core.py:842]     kv_cache_specs = self.model_executor.get_kv_cache_specs()
(EngineCore_DP0 pid=321405) ERROR 11-24 09:11:35 [core.py:842]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=321405) ERROR 11-24 09:11:35 [core.py:842]   File "/vllm-workspace/vllm/vllm/v1/executor/abstract.py", line 129, in get_kv_cache_specs
(EngineCore_DP0 pid=321405) ERROR 11-24 09:11:35 [core.py:842]     return self.collective_rpc("get_kv_cache_spec")
(EngineCore_DP0 pid=321405) ERROR 11-24 09:11:35 [core.py:842]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=321405) ERROR 11-24 09:11:35 [core.py:842]   File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 354, in collective_rpc
(EngineCore_DP0 pid=321405) ERROR 11-24 09:11:35 [core.py:842]     while self.futures_queue:
(EngineCore_DP0 pid=321405) ERROR 11-24 09:11:35 [core.py:842]           ^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=321405) ERROR 11-24 09:11:35 [core.py:842] AttributeError: 'AscendMultiprocExecutor' object has no attribute 'futures_queue'
(ApiServer_1 pid=321407) Process ApiServer_1:
(ApiServer_1 pid=321407) Traceback (most recent call last):
(ApiServer_1 pid=321407)   File "/usr/local/python3.11.13/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
(ApiServer_1 pid=321407)     self.run()
(ApiServer_1 pid=321407)   File "/usr/local/python3.11.13/lib/python3.11/multiprocessing/process.py", line 108, in run
(ApiServer_1 pid=321407)     self._target(*self._args, **self._kwargs)
(ApiServer_1 pid=321407)   File "/vllm-workspace/vllm/vllm/entrypoints/cli/serve.py", line 247, in run_api_server_worker_proc
(ApiServer_1 pid=321407)     uvloop.run(
(ApiServer_1 pid=321407)   File "/usr/local/python3.11.13/lib/python3.11/site-packages/uvloop/__init__.py", line 92, in run
(ApiServer_1 pid=321407)     return runner.run(wrapper())
(ApiServer_1 pid=321407)            ^^^^^^^^^^^^^^^^^^^^^
(ApiServer_1 pid=321407)   File "/usr/local/python3.11.13/lib/python3.11/asyncio/runners.py", line 118, in run
(ApiServer_1 pid=321407)     return self._loop.run_until_complete(task)
(ApiServer_1 pid=321407)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(ApiServer_1 pid=321407)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(ApiServer_1 pid=321407)   File "/usr/local/python3.11.13/lib/python3.11/site-packages/uvloop/__init__.py", line 48, in wrapper
(ApiServer_1 pid=321407)     return await main
(ApiServer_1 pid=321407)            ^^^^^^^^^^
(ApiServer_1 pid=321407)   File "/vllm-workspace/vllm/vllm/entrypoints/openai/api_server.py", line 2043, in run_server_worker
(ApiServer_1 pid=321407)     async with build_async_engine_client(
(ApiServer_1 pid=321407)   File "/usr/local/python3.11.13/lib/python3.11/contextlib.py", line 210, in __aenter__
(ApiServer_1 pid=321407)     return await anext(self.gen)
(ApiServer_1 pid=321407)            ^^^^^^^^^^^^^^^^^^^^^
(ApiServer_1 pid=321407)   File "/vllm-workspace/vllm/vllm/entrypoints/openai/api_server.py", line 195, in build_async_engine_client
(ApiServer_1 pid=321407)     async with build_async_engine_client_from_engine_args(
(ApiServer_1 pid=321407)   File "/usr/local/python3.11.13/lib/python3.11/contextlib.py", line 210, in __aenter__
(ApiServer_1 pid=321407)     return await anext(self.gen)
(ApiServer_1 pid=321407)            ^^^^^^^^^^^^^^^^^^^^^
(ApiServer_1 pid=321407)   File "/vllm-workspace/vllm/vllm/entrypoints/openai/api_server.py", line 236, in build_async_engine_client_from_engine_args
(ApiServer_1 pid=321407)     async_llm = AsyncLLM.from_vllm_config(
(ApiServer_1 pid=321407)                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
(ApiServer_1 pid=321407)   File "/vllm-workspace/vllm/vllm/utils/func_utils.py", line 116, in inner
(ApiServer_1 pid=321407)     return fn(*args, **kwargs)
(ApiServer_1 pid=321407)            ^^^^^^^^^^^^^^^^^^^
(ApiServer_1 pid=321407)   File "/vllm-workspace/vllm/vllm/v1/engine/async_llm.py", line 203, in from_vllm_config
(ApiServer_1 pid=321407)     return cls(
(ApiServer_1 pid=321407)            ^^^^
(ApiServer_1 pid=321407)   File "/vllm-workspace/vllm/vllm/v1/engine/async_llm.py", line 133, in __init__
(ApiServer_1 pid=321407)     self.engine_core = EngineCoreClient.make_async_mp_client(
(ApiServer_1 pid=321407)                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(ApiServer_1 pid=321407)   File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 120, in make_async_mp_client
(ApiServer_1 pid=321407)     return DPLBAsyncMPClient(*client_args)
(ApiServer_1 pid=321407)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(ApiServer_1 pid=321407)   File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 1176, in __init__
(ApiServer_1 pid=321407)     super().__init__(
(ApiServer_1 pid=321407)   File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 1017, in __init__
(ApiServer_1 pid=321407)     super().__init__(
(ApiServer_1 pid=321407)   File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 808, in __init__
(ApiServer_1 pid=321407)     super().__init__(
(ApiServer_1 pid=321407)   File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 523, in __init__
(ApiServer_1 pid=321407)     raise TimeoutError(
(ApiServer_1 pid=321407) TimeoutError: Timed out waiting for engines to sendinitial message on input socket.
(ApiServer_0 pid=321406) Process ApiServer_0:
(ApiServer_0 pid=321406) Traceback (most recent call last):
(ApiServer_0 pid=321406)   File "/usr/local/python3.11.13/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
(ApiServer_0 pid=321406)     self.run()
(ApiServer_0 pid=321406)   File "/usr/local/python3.11.13/lib/python3.11/multiprocessing/process.py", line 108, in run
(ApiServer_0 pid=321406)     self._target(*self._args, **self._kwargs)
(ApiServer_0 pid=321406)   File "/vllm-workspace/vllm/vllm/entrypoints/cli/serve.py", line 247, in run_api_server_worker_proc
(ApiServer_0 pid=321406)     uvloop.run(
(ApiServer_0 pid=321406)   File "/usr/local/python3.11.13/lib/python3.11/site-packages/uvloop/__init__.py", line 92, in run
(ApiServer_0 pid=321406)     return runner.run(wrapper())
(ApiServer_0 pid=321406)            ^^^^^^^^^^^^^^^^^^^^^
(ApiServer_0 pid=321406)   File "/usr/local/python3.11.13/lib/python3.11/asyncio/runners.py", line 118, in run
(ApiServer_0 pid=321406)     return self._loop.run_until_complete(task)
(ApiServer_0 pid=321406)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(ApiServer_0 pid=321406)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(ApiServer_0 pid=321406)   File "/usr/local/python3.11.13/lib/python3.11/site-packages/uvloop/__init__.py", line 48, in wrapper
(ApiServer_0 pid=321406)     return await main
(ApiServer_0 pid=321406)            ^^^^^^^^^^
(ApiServer_0 pid=321406)   File "/vllm-workspace/vllm/vllm/entrypoints/openai/api_server.py", line 2043, in run_server_worker
(ApiServer_0 pid=321406)     async with build_async_engine_client(
(ApiServer_0 pid=321406)   File "/usr/local/python3.11.13/lib/python3.11/contextlib.py", line 210, in __aenter__
(ApiServer_0 pid=321406)     return await anext(self.gen)
(ApiServer_0 pid=321406)            ^^^^^^^^^^^^^^^^^^^^^
(ApiServer_0 pid=321406)   File "/vllm-workspace/vllm/vllm/entrypoints/openai/api_server.py", line 195, in build_async_engine_client
(ApiServer_0 pid=321406)     async with build_async_engine_client_from_engine_args(
(ApiServer_0 pid=321406)   File "/usr/local/python3.11.13/lib/python3.11/contextlib.py", line 210, in __aenter__
(ApiServer_0 pid=321406)     return await anext(self.gen)
(ApiServer_0 pid=321406)            ^^^^^^^^^^^^^^^^^^^^^
(ApiServer_0 pid=321406)   File "/vllm-workspace/vllm/vllm/entrypoints/openai/api_server.py", line 236, in build_async_engine_client_from_engine_args
(ApiServer_0 pid=321406)     async_llm = AsyncLLM.from_vllm_config(
(ApiServer_0 pid=321406)                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
(ApiServer_0 pid=321406)   File "/vllm-workspace/vllm/vllm/utils/func_utils.py", line 116, in inner
(ApiServer_0 pid=321406)     return fn(*args, **kwargs)
(ApiServer_0 pid=321406)            ^^^^^^^^^^^^^^^^^^^
(ApiServer_0 pid=321406)   File "/vllm-workspace/vllm/vllm/v1/engine/async_llm.py", line 203, in from_vllm_config
(ApiServer_0 pid=321406)     return cls(
(ApiServer_0 pid=321406)            ^^^^
(ApiServer_0 pid=321406)   File "/vllm-workspace/vllm/vllm/v1/engine/async_llm.py", line 133, in __init__
(ApiServer_0 pid=321406)     self.engine_core = EngineCoreClient.make_async_mp_client(
(ApiServer_0 pid=321406)                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(ApiServer_0 pid=321406)   File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 120, in make_async_mp_client
(ApiServer_0 pid=321406)     return DPLBAsyncMPClient(*client_args)
(ApiServer_0 pid=321406)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(ApiServer_0 pid=321406)   File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 1176, in __init__
(ApiServer_0 pid=321406)     super().__init__(
(ApiServer_0 pid=321406)   File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 1017, in __init__
(ApiServer_0 pid=321406)     super().__init__(
(ApiServer_0 pid=321406)   File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 808, in __init__
(ApiServer_0 pid=321406)     super().__init__(
(ApiServer_0 pid=321406)   File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 523, in __init__
(ApiServer_0 pid=321406)     raise TimeoutError(
(ApiServer_0 pid=321406) TimeoutError: Timed out waiting for engines to sendinitial message on input socket.

wangxiyuan · 2025-11-24T09:36:48Z

see: #4400

This comment was marked as spam.

Sign in to view

wangxiyuan force-pushed the 4142 branch 2 times, most recently from b4ebebe to cb01cd8 Compare November 24, 2025 02:20

wangxiyuan changed the title ~~Upgrade to v0.11.2~~ Upgrade vLLM to v0.11.2 Nov 24, 2025

wangxiyuan mentioned this pull request Nov 24, 2025

[WIP] Update newest #4142

Closed

github-actions bot added documentation Improvements or additions to documentation module:tests module:core labels Nov 24, 2025

wangxiyuan force-pushed the 4142 branch from cb01cd8 to 18a16d0 Compare November 24, 2025 02:27

leo-pony and others added 20 commits November 24, 2025 10:58

fix break by vllm commit: Support LoRA with speculative decoding #21068

cf27c04

Signed-off-by: leo-pony <[email protected]>

[Hybrid] Pass kernel block size to builders #27753

a86a6c1

Signed-off-by: leo-pony <[email protected]>

fix the main-to-main break by:[Bug] Fix env string 0 same to True #28159

f1f20cc

Signed-off-by: leo-pony <[email protected]>

[Core] Async scheduling + structured outputs compatibility#26866

2d83516

Signed-off-by: leo-pony <[email protected]>

fix structure output break bduring adapt to llm: Async scheduling + s…

ffc519f

…tructured outputs compatibility#26866 Signed-off-by: leo-pony <[email protected]>

fix structured outputs compatibility

03cb9fb

Signed-off-by: 22dimensions <[email protected]>

fix mtp breaks in modelrunner and format fix

e1bbbd8

Signed-off-by: leo-pony <[email protected]>

fix mypy issues

cb507ae

Signed-off-by: leo-pony <[email protected]>

model runner execute model support v0.11.0 branch

3e1bbe8

Signed-off-by: leo-pony <[email protected]>

fix format issue

399e165

Signed-off-by: leo-pony <[email protected]>

update to releases/v0.11.1

2dd522e

Signed-off-by: 22dimensions <[email protected]>

fix break by vllm:[BugFix][VL] Fix FA selection on Qwen2.5-VL #27790

5a50e2a

Signed-off-by: leo-pony <[email protected]>

fix scheduler

7f1cd59

Signed-off-by: 22dimensions <[email protected]>

skip ut, nightly, v0.11.0

8399298

Signed-off-by: leo-pony <[email protected]>

Skip 1th e2e full

0c76f8e

Signed-off-by: leo-pony <[email protected]>

skip has tested cases

f1f4161

Signed-off-by: leo-pony <[email protected]>

Fix vllm break:Support LoRA with speculative decoding:#21068

5bad0f3

Signed-off-by: leo-pony <[email protected]>

remove skip of nightly a2

0764b27

Signed-off-by: leo-pony <[email protected]>

fix the deepseek mtp break

7502030

Signed-off-by: leo-pony <[email protected]>

Add comments for deepseek torchair mtp break, vllm PR:27922

eb2927f

Signed-off-by: leo-pony <[email protected]>

leo-pony and others added 20 commits November 24, 2025 10:58

Add comments fRestore full test cases

47cc8de

Signed-off-by: leo-pony <[email protected]>

Enable torchair test in single card full test

11f6d4e

Signed-off-by: leo-pony <[email protected]>

just to trigger ci test

4197a11

Signed-off-by: leo-pony <[email protected]>

fix vllm break by: Enable sequence parallelism matching w/o custom op…

97df642

…s enabled #27126 Signed-off-by: leo-pony <[email protected]>

vllm break of PR:vllm-project/vllm#24794

58303c8

Signed-off-by: leo-pony <[email protected]>

adapt qwen3 next

dabd793

Signed-off-by: 22dimensions <[email protected]>

fix the undefine splitting_ops in test_sp_for_qwen3_moe

ccb837b

Signed-off-by: leo-pony <[email protected]>

make light test take effect

9a95847

Signed-off-by: leo-pony <[email protected]>

fix vllm break: Fix backend selection for encoder-only models (#28534)

f9f9bf2

Signed-off-by: leo-pony <[email protected]>

fix vllm break:Refactor CUDA attention backend selection logic#24794

40905c8

Signed-off-by: leo-pony <[email protected]>

fix break of vllm:Rename clashing method names for vLLM model protoco…

0a81671

…l #27583 Signed-off-by: leo-pony <[email protected]>

fix lint error

940fb90

Signed-off-by: leo-pony <[email protected]>

fix vllm break: Avoid bytecode hook and simplify TorchCompileWrapperW…

3cc8278

…ithCustomDipatch #25110 Signed-off-by: leo-pony <[email protected]>

skip A3-nightly test

9111fc7

Signed-off-by: leo-pony <[email protected]>

replace VisionPatchEmbed for better performance

3a07d19

Signed-off-by: shen-shanshan <[email protected]>

fix

c987667

Signed-off-by: shen-shanshan <[email protected]>

fix vllm break: [Model] Pass mm_features directly into get_mrope_inpu…

6f5cfe1

…t_positions#28399 Signed-off-by: leo-pony <[email protected]>

fix vllm break:[Misc] Make SchedulerConfig.max_model_len init-only #2…

c03dfef

…8733 Signed-off-by: leo-pony <[email protected]>

fix vllm break:[V1] Support MP Executor for multi node distributed in…

d977d4f

…ference#23691 Signed-off-by: leo-pony <[email protected]>

update version

fcfb3a0

Signed-off-by: wangxiyuan <[email protected]>

wangxiyuan force-pushed the 4142 branch from 18a16d0 to fcfb3a0 Compare November 24, 2025 02:58

wangxiyuan added the vllm-break label Nov 24, 2025

github-actions bot added the merge-conflicts label Nov 24, 2025

wangxiyuan closed this Nov 24, 2025

wangxiyuan deleted the 4142 branch December 4, 2025 07:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Upgrade vLLM to v0.11.2 #4368

Upgrade vLLM to v0.11.2 #4368

Uh oh!

wangxiyuan commented Nov 24, 2025 •

edited by github-actions bot

Loading

Uh oh!

This comment was marked as spam.

Uh oh!

github-actions bot commented Nov 24, 2025

Uh oh!

github-actions bot commented Nov 24, 2025

Uh oh!

zhangxinyuehfad commented Nov 24, 2025 •

edited

Loading

Uh oh!

zhangxinyuehfad commented Nov 24, 2025

Uh oh!

wangxiyuan commented Nov 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Upgrade vLLM to v0.11.2 #4368

Upgrade vLLM to v0.11.2 #4368

Uh oh!

Conversation

wangxiyuan commented Nov 24, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

This comment was marked as spam.

Uh oh!

github-actions bot commented Nov 24, 2025

Uh oh!

github-actions bot commented Nov 24, 2025

Uh oh!

zhangxinyuehfad commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhangxinyuehfad commented Nov 24, 2025

Uh oh!

wangxiyuan commented Nov 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

wangxiyuan commented Nov 24, 2025 •

edited by github-actions bot

Loading

zhangxinyuehfad commented Nov 24, 2025 •

edited

Loading